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STRONG  CONSISTENCY  OF  M-ESTIMATES 
FOR  THE  LINEAR  MODEL 


X.  R„  Chen  and  Yeuhua  Wu 


ABSTRACT 

This  article  defines  the  M-estimate  for  the  linear  model  directly 
from  the  minimization  problem 

n 

l  p(Y.  -  a  -  e'X. )  =  min. 

1=1  1  1 

Suppose  that  (X^Y^),  ...,  ( X^ ,  Y^ ) ,  ...  are  i.i.d.  observations  of  a 
random  vector  (X,Y),  where  Y  is  one-dimensional  and  X  may  be  multi¬ 
dimensional.  It  is  shown  that  the  M-estimates  an,  6n  defined  in  this 
manner  converge  with  probability  one  to  Oq,  Bq  respectively  ((ciq.Bq)  is 

the  true  parameter)  as  n  +  »,  under  very  general  conditions  on  the 
function  p  and  the  distribution  of  (X,Y). 


AMS  1980  subject  classifications:  Primary  62J05;  secondary  62F12. 


Key  words  and  phrases:  M-estimate,  linear  model,  strong  consistency. 


Research  sponsored  by  the  Air  Farce  Office  of  Scientific  Research  under 
Contract  F49620-85-C-0008.  The  United  States  Government  is  authorized 
to  reproduce  and  distribute  reprints  for  governmental  purposes  notwith¬ 
standing  any  copyright  notation  hereon. 


1.  INTRODUCTION 


Let  (X^,Y^),  ...,  (Xn,Yn),  be  i.i.d.  observations  of  a  random 
vector  (X,Y),  where  Y  is  one-dimensional  and  X  may  be  multi-dimensional. 
Suppose  that  the  regression  of  Y  to  X,  in  some  sense,  is  a  linear  function 
*0  +  ^x*  **  1S  desired  t0  estimate  the  unknown  parameters  0^,  by 

using  the  observations  (Xj,Y^),  ....  (X^»Y^).  A  much  discussed  class  of 
estimates  is  the  so-called  M-estimate,  which  takes  the  solution  of  -the 
minimization  problem 

n 

l  p(Y,-  -  a  -  6  * X . )  =  min  (1) 

i  =  l  1  1 

as  the  estimator.  Here  p  is  a  properly  selected  function  defined  over 

.  /  /. 

R-  =  (-»,»).  •  ■'  * *  '  ' 

In  literature,  the  case  is  often  considered  in  which  the  XVs  are 
supposed  to  be  known  constant  vectors  rather  than  observations  of  some 
random  vector.  But  in  many  applications,  especially  in  problems  of  econo¬ 
metrics,  it  is  more  practical  to  assume  the  random  character  of  the  X^s. 

A  common  feature  of  most  works  dealing  with  this  estimation  problem, 
for  example  [2],  [3]  and  [6],  is  to  assume  that  p  has  a  continuous  deriva¬ 
tive  ip  everywhere  over  R1  ,  thus  converting  the  minimization  problem  (1) 
to  the  problem  of  solving  the  system  of  equations 

n  n 

*  *(Y.  -a- S’X.)  =  0,  V  X.*(Y.  -a-B'X.)  «  0.  (2) 

i=l  1  1  i=l  1  1  1 

In  order  to  validate  this  procedure,  one  usually  makes  the  assumption 
that  o  is  a  convex  function.  These  assumptions  seem  unduly  restrictive 
for,  in  some  important  cases  such'  as  p(t)  =  |t|  (Minimum  L^-Norm  estimate), 
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p  is  not  everywhere  differentiable.  Also,  the  convexity  assumption  ex¬ 
cludes  many  functions  with  practical  importance.  For  example, 
p(t)  =  mi n{ 1 1 1 , k >  for  some  constant  k  >  0.  In  general,  any  function 
bounded  over  R'  is  not  allowed  under  this  assumption. 

So  it  makes  much  sense  to  tackle  the  estimation  problem  directly 
starting  from  the  minimization  problem  (1).  Some  works  (for  example, 
[1]*  [4],  [7])  have  been  done  in  this  respect  for  the  special  case  of 
p(t)  =  | t | ,  but  as  far  as  the  authors  know,  no  work  exists  for  general  p 
up  to  now. 

The  purpose  of  the  present  article  is  to  study  this  problem  in  case 
that  the  X ^ ' s  are  observations  of  a  random  vector.  To  some  extent  our 
method  can  also  be  employed  to  deal  with  the  case  in  which  the  X^'s  are 
known  nonrandom  vectors,  but  some  additional  assumptions  will  be  needed. 


2.  FORMULATION  OF  THE  RESULTS 


In  the  sequel  we  shall  stick  to  the  notations  introduced  in  Section 
lo  We  shall  denote  by  (ctn,Bn)  a  Borel -measurable  solution  of  the  minimi¬ 
zation  problem  (1), 

We  shall  always  impose  the  following  conditions  on  the  function 
p  and  the  random  vector  (X,Y): 

a.  p  is  continuous  everywhere  over  R'. 

b.  p  is  nondecreasing  over  (0,®)  and  nonincreasing  over  (-®,0). 

From  b  it  is  seen  that  p(0)  =  min{p(t):  t  e  R1}.  Without  losing 
generality,  we  may  assume  that 


c.  p(0)=0,  p(t)  >_  0  on  R'. 

(3) 

d.  Ep ( Y  -  a  -  B 1 X)  <  ®  for  any  a  e  R1 ,  6  e  Rd. 

(4) 

e.  For  any  x  e  Rd,  the  function 

fx(e)  s  E{p ( Y  -  e) | X  =  x} 

(5) 

attains  its  minimum  uniquely  at 

ex  3  “0  +  60X 

(6) 

with  ctg,  Bq  not  depending  on  x. 

For  convenience  of  reference,  we  shall  call  the 

set  of  conditions 

(a,  b,  c,  d,  e}  by  "Condition  (A)". 

THEOREM  1.  Suppose  that  the  following  are  true: 

1.  Condition  (A)„ 

2.  p(®)  =  p  (-<*>)  =  00 . 

3.  If  | a |  +  ||b||  >  0,  then  P(a  +  B'X  =  0)  <  1  where  ||f||  is  the 


The  following  theorem  deals  with  the  case  that  p  may  be  bounded. 

THEOREM  2.  Suppose  that  the  following  are  true: 

1.  Condition  (A). 

2.  p(«)  =  p(-®) . 

3.  If  ] a |  +  || 6 1!  >  0,  then  P(a  +  e'X  =  0)  =  0. 

Then  (7)  holds  true. 

Finally,  we  have  the  following  theorem  concerning  the  convergence 

rate  of  a  and  6  . 

n  n 

THEOREM  3.  Suppose  that 

1.  The  conditions  of  Theorem  1  or  Theorem  2  are  true. 

2.  For  any  a  e  R'  and  6  e  Rd,  the  moment  generating  function  of 
p(Y-a-B'X)  is  finite  in  some  neighborhood  of  zero  (the  neighborhood 
may  depend  on  a,  b). 

Then  for  arbitrarily  given  e  >  0,  there  exists  constant  c  >  0  inde¬ 
pendent  of  n  such  that 

P<I«„-«0I  i  s)  =  °<e'C")-  p<  It  Sn  -  B0I|  i  c)  =  0(e‘cn)  (8) 

Before  entering  the  details  of  the  proof,  we  make  some  remarks 
about  the  conditions  of  the  theorems. 

1.  Condition  b  seems  quite  natural  from  the  practical  point  of  view. 
As  for  the  continuity  assumption  a,  it  also  seems  reasonable.  This  con¬ 
dition  can  be  weakened  to  some  extent  at  the  expense  of  a  much  stronger 
condition  on  the  distribution  of  (X,Y). 


2.  Condition  3  of  Theorem  1,  in  fact,  is  a  consequence  of  e,  Condi¬ 
tion  3  of  Theorem  2  holds  when  X  possesses  a  density. 

3.  Condition  e  is  closely  related  to  the  meaning  of  the  regression. 

More  clearly  speaking,  the  exact  meaning  of  the  regression  determines  the 

class  of  functions  p  which  can  be  used  in  formulating  the  minimization 

problem  (1).  For  instance,  when  cig  +  6qX  is  the  conditional  median  of  Y 

given  X  =  x  (median  regression).  We  can  choose  p(t)  =  |t|,  or  any  p  for 

which  E{p(Y  -  e) | X  =  x}  attains  its  minimum  uniquely  at  the  conditional 

median.  Likewisely,  when  1S  the  conditional  expectation  of  Y 

2 

given  X  =  x,  we  can  choose  p(t)  =  t  .  An  important  case  is  that  the 
conditional  distribution  of  Y  given  X  =  x  is  symmetric  and  unimodal  with 
center  ctg  +  BqX.  In  this  case,  p  can  be  chosen  as  any  even  function 
satisfying  conditions  a,  b,  d,  and  that  p(t)  >  p(0)  when  t  t  0, 
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3.  PROOF  OF  THE  THEOREMS 

We  give  the  detailed  proof  of  Theorem  1.  An  easy  modification  of 
the  argument  enables  us  to  prove  the  remaining  two  theorems. 

The  main  body  of  the  proof  of  Theorem  1  is  contained  in  the  follow¬ 
ing  two  lemmas. 


LEMMA  1.  Suppose  that  the  conditions  of  Theorem  1  are  satisfied, 
Tl  is  a  bounded  closed  set  in  Rd+^  ,  and  (ciq.Bq)  etfk  Let  (an,Bn)  be  a 
Borel  measurable  solution  of  the  restricted  minimization  problem 

n 

V  p ( Y .  -  a  -  b'X.)  =  min  (9) 

i  =  l  1  1 


with  (a,B‘ )  being  restricted  in(K.  Then  as  n  -*»  °°,  we  have 


6n  *  B0’ 


a.s. 


(10) 


Denote  by  SR  tne  closed  ball  {(a,B’):  a2  +  ||s||2  <_  R2}  in  Rd+1 . 
,6^)  is  the  solution  of  the  unrestricted  minimization  problem  (1)  as 
as  mentioned  earlier. 


LEMMA  2.  Suppose  that  the  conditions  of  Theorem  1  are  satisfied, 

then  there  exists  constant  R  such  that  with  probability  one  we  have 

[a.  ,b')  €  SD  for  n  sufficiently  large, 
n  n  k 

It  is  readily  seen  that  Theorem  1  follows  from  these  two  lemmas. 
Indeed,  take  H  =  Sp,  where  R  is  the  constant  mentioned  in  Lemma  2, 
Lemma  2  indicates  that 

P((an,B^)  =  (clp,^)  for  n  sufficiently  large  =  1, 
which  in  turn  entails 
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1*3 

ijS 

vS 


;V.;i 


a 


V 


P/lim(a  -  a  )  =  0,  lim(e  -  6  )  =  o)  =  1 
n  n  n-x*  n  n  / 


From  this  and  Lemma  1,  (7)  follows, 


Proof  of  Lemma  1.  Without  loss  of  generality,  we  may  assume 

a0  =  °»  e0  =  °* 

For  any  constant  i  >  0,  define 

=  [-£,^]d+1. 

Take  I  large  enough  such  that  '3)  A  .  Denote  the  T  =  2d+Voints 
(±£,  ±i,  ....  ±i)  by  (a] ,  b^)»  ...»  ^T.by).  According  to  conditions  b,  c, 
it  can  easily  be  shown  that 


0  <  p(y-a-e'x)  <  j  o(y-a.*b!x)  (11) 

“  i=l  1  1 

for  any  (x',y)  e  Rd+1  and  (a,6‘)  e  A  .  Define 

Q(a,6)  =  Ep(Y  -  a-  fi'x).  (12) 

By  (11)  and  conditions  a,  d,  it  follows  from  the  Lebesgue  convergence 
theorem  that  Q  is  continuous.  By  condition  e  it  follows  that 

q(a,e)  >  Q(0 ,0) ,  when  |a|  +  ||e||  >  o.  (12) 

Hence  for  arbitrarily  given  e  >  0,  we  have 

q  =  inf {Q(a,e) :  (a, 6’)  e(R  -  A£}  -  Q(0,0)  >  0,  (14) 

For  any  constant  M  >  0,  denote  by  1^  =  I M( X , Y )  the  indicator  of  the  set 
AM„  Choose  e-|  e  (0,  q/4)  and  M  >  0  large  enough  such  that 

P{(X\Y)  e  AM)  >  1  -  er  (15) 

E[Imp(Y  -  ot  -  b'x)]  <  e-|  for  any  (a, 6')  e(0’„  (16) 


The  existence  of  such  a  constant  M  follows  from  condition  d  and  (11), 


Write  those  elements  in  {(X^,Y^),  ....  ( Xn ,Yr ) >  which  fall  into  the 
set  Am,  in  the  order  of  appearance,  as 

(X*,Y*) . (X*,  ,Y*« ) . 

Evidently,  these  variables  are  conditionally  i.i.d.  given  n',  with  the 
common  distribution  (X,Y)  |  ((X*  ,Y)  e  j„  Define  the  event 

Bn  =  {n'  >  (1  -  2e-| )n}. 

Then  by  (15)  and  the  strong  law  of  large  numbers,  it  follows  that  with 
probability  one  Bn  occurs  for  n  sufficiently  large.  Put 

QM(a,6)  =  E[(  1  -  IM)p(Y-a-6'X)]. 

Then  QM  is  continuous  in  a,  8.  Find  e^  >  0  sufficiently  small,  such  that 

~  Q^(oi,b)|  <_  e i 

when 

(a, S')  e<B,  (a  ,B 1 )  €®,  |  a  -  a  |  <  e^,  l|  3  ~  P  I!  i  e^. 

Put 

(E  =  sup{  |y  -  a  -  b 1  x  I :  (x',y)  e  A^,  (a,b')  e'H'}.  (17) 

Find  e2  >  0  sufficiently  small  such  that 

sup{  |p(r2)  -  p(r1)  |  :  |  r^  |  <_(B>,  \r^  |  <_$,  |  r2  -  r]  |  <  e21  <  e-, .  (18) 

Find  6  (0>e3)  such  that 

|(a  +  b'x)  -  (a  +  b'x)|  £  e2  (19) 

when 

(a  ,b  ’ )  (a,b')  e'B,  !  a  -  a  j  1^3,  ||b-b||  <_  e3,  ||x||  <_  Md, 

Choose  a  finite  set  =  {(0^.  ,e!) :  i  =  i„.,,m}cH  -  A£,  such  that 
for  any  (a, 8')  6®-  A£  there  exists  (c^.  ,e!)  e  Tt -j  satisfying 
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By  the  strong  law  of  large  numbers,  with  probability  one  we  have 

b  >  E[p<¥ - “j  -  sjx)o  -  iM)]  -  c, 

>  Ep(Y  -  a.  -  BjX)  -  2e i 

>  Q(0 ,0)  +  q  -  2ty  j  =  lt,.,,m 

for  n  sufficiently  large.  Hence  with  probability  one 

>  (1  -  2c] )  [Q(0,0)  +  q  -  2e-|  ] 

=  Q(0,0)  +  n\  j  =  1 ....  ,m 


(20) 


(21) 


for  n  sufficiently  large,  where 

n'  =  (q  ~  2e] )  ( 1  -  2e-| )  -  2e.,Q(0,0). 

Now  choose  arbitrarily  (u,e')  e  (R  -  A  .  Find  (a.,s’-)  eP'-.  such  that 

E  J  J  " 

laj_al  1  £3,  ||  B  -  —  8  |I  ^.£30  According  to  (18),  (19),  (21)  ,  we  have 


’X.)  >  l  p(Y*-a-B'X; 

1  ~  i=l  1  1 

>  \  p(Y*-a  -B^X*) 
i  =1  1  J  J  1 


-  I  |  P  ( Y*  -  a  .  -  B  •  X* ) 

i=l  1  J  J  1 


p(Y*-a-6'x!)| 


1  nfQ(0 ,0)  +  n ']  -  n’e1  >  n[Q(0,0)  +  nJ,  (22) 


Choose  e -j  >  0  sufficiently  small  such  that  n  >  0*  (22)  should  be  under¬ 
stood  that  it  is  true  simultaneously  for  all  (a, 6’)  e  %  -  A£  when  (21)  is 
true*  Therefore,  with  probability  one  we  have 


mind  l  p ( Y .  -  a  -  6 * X . )  :  (ote')  evB  -  A  }  >  Q(0,0)  +  n  (23) 

n  i =i  i  i  e  - 

for  n  sufficiently  large. 

On  the  other  hand,  by  the  strong  law  of  large  numbers,  with  proba¬ 
bility  one  we  have 

1  n 

~  y  p(Y.)  <  Q(0 ,0)  +  n  (24) 

n  i=l  1 

*or  n  sufficiently  large.  From  (23),  (24),  it  follows  that 

a 

P(|an|  _<  e,  ||en||  ^  e,  for  n  sufficiently  large)  =  1. 

This  concludes  the  proof  of  Lemma  1. 

Proof  of  Lemma  2.  Write  =  {(a.B1):  |  a  I  ^  +  ||b||^  =  1).  According 
to  condition  3  of  Theorem  1,  one  can  find  e  >  0  such  that 

q  =  inf{P( | a  +  6 1 X |  >  e):  (a, S')  e  S] }  >  0.  (25) 

Eind  M  sufficiently  large,  such  that  P(X  e  A^)  >  1  -  q/4.  Put 
n  =  [3(1  +dM)]~\  Choose  a  finite  set  S-j  c  in  such  a  way  so  that 
for  any  0  e  there  exists  e  €  satisfying  || 0  -  e  j!  <  n.  By  (25)  and 
the  strong  law  of  large  numbers,  with  probability  one  we  have 

#{i  :  1  _<  i  _<  n ,  | a  +  B * X .  |  >  t  for  any  (a, 8' )  €  S-j }  _>  nq/2  (26) 

for  n  sufficiently  large,  where  #(A)  denotes  the  number  of  elements 
of  set  A. 


"'Ci 

a 


1 


ut 


By  the  strong  law  of  large  numbers,  with  probability  one  we  have 
#{ i  :  1  <  i  <  n,  Xi  e  AM}  >  n(l  -  q/4) .  (2> 


Choose  a  constant  K  such  that 


qK/8  >  Q(0,0)  +  1 

where  the  function  Q  is  defined  by  (12).  According  to  condition  2  of 
Theorem  1,  one  can  find  h  so  that  p(a)  >_  K  when  |a|  _>  h.  Choose  a 
constant  R  large  enough  so  that 

eR/4  >  h, 

P(|y(  <  eR/4)  >  1  -  q/8. 

By  the  strong  law  of  large  numbers,  with  probability  one  we  have 

#{i  :  1  <_  i  <  n,  |  V ^  |  £  eR/4)  >_  n ( 1  -  q/8) .  (21 

Now  choose  arbitrarily  a  point  (a ' )  outside  SR.  We  have 
(a, 8')  =  r(a,6'):  r  >  R,  (a,  8')  e  S-j . 

Assume  that  ( 26 ) - ( 28)  are  true,  then 
1.  If  (a ,8 1 )  e  Sr  from  (26)  we  have 

#{i :  1  <  i  £  n,  ja  +  8 ’ [  >_  Re}  >_  nq/2.  (2? 

From  (28),  (29),  we  obtain 

#{i :  1  _<  i  <  n,  |Yi  -  a  -  B'X^  |  >_  3Re/4}  >_  3nq/8.  (3C 

2U  If  (a ,8 1 )  i  Sy  then  choose  (a*, 8*')  e  S-j  satisfying 


|  a  -  a  <  n  , 


||8  -  P*  ||  in. 


when  |a*  +  6*'X.  |  >  e  and  Xi  e  AM,  we  have 


|a  +  &'Xi|  >  e  -  |  (a*  -  a)  +  (8*- i)'X.| 
1  e  -  |  a*  -  a  |  -  ||  3*  -  £  II  *  II  X.. 
>  e  -  n  ■  ndM  >  e/2. 


$ 

cr 

.  •  k  \ 


Hence  la  +  3'Xj  >  Re/2„  From  this,  and  ( 26 )- ( 28) ,  we  get 

#{i :  1  <  i  £  n,  |Y.  -  a  -  c'X^  |  >  Re/4}  >_  nq/8  (31 ) 

Summarizing  these  two  cases,  we  see  that  with  probability  one, 

1  h 

^  l  p(Y.  -  a  -  6 1 X . )  >  qK/8  >  Q(0,0)  +  1 
n  i  =  l  1  1 

holds  simultaneously  for  all  (a.B1)  outside  SR,  when  n  is  large  enough. 
Since  (24)  is  true  for  n  =  1 ,  we  see  that  with  probability  one 


.1 


n 


mi  r,{-r  )  c(Y.  -  a  -  B*  X . ) :  (*,£')  k  SQ 
n  i  =  1  1  1  K 


(32) 


for  n  sufficiently  large.  Therefore 

P{(an,e^)  e  for  n  sufficiently  large}  =  1  (33) 

which  proves  Lemma  2. 

Proof  of  Theorem  2.  Without  loss  of  generality,  assume  =  0, 

6q  =  0.  No  change  is  needed  in  the  proof  of  Lemma  1,  For  a  proof  of 
Lemma  2  under  the  conditions  of  Theorem  2,  first  note  that  Q(0,0)  <  L(») 
by  conditions  a,  b,  e  and  condition  2  of  Theorem  2.  From  condition  3  of 
Theorem  3,  it  is  readily  seen  that  for  any  a  <  1  there  exists  e  >  0  such 
that 

inf {P(  |ci  +  B * X |  >  c):  (a.B1 )  €  S] }  *  a. 

Starting  from  these  two  facts  and  employing  the  argument  used  earlier, 
it  can  be  shown  that  there  exists  constant  R  such  that 

pi  1  V  p(Y.  -  9  -  3fX. )  >  c  for  (a,e’ )  £  Sp 

n  -S  1  1  r 


simultaneously,  when  n  large  enough )  -  1 


(34) 


14 


where  Q(0,0)  <  c  <  L(»h  From  (34)  we  obtain  (32),  hence  (33), 

Proof  of  Theorem  3..  The  proof  follows  from  the  following  two  lemmas: 

LEMMA  1'.  Under  the  conditions  of  Theorem  3  for  arbitrarily  given 
e  >  0,  there  exists  constant  c  >  0  such  that 

p( |an  "  ao^  -  =  ^(e  Cn) 

P(  |en  -  e0l  »  e )  =  o(e~cn) 

where  a  ,  6  are  the  same  as  in  Lemma  1, 
n  n 

LEMMA  2‘  Under  the  conditions  of  Theorem  3,  there  exist  constants 
R  >  0  and  c  >  0  such  that 

P^an  "  a0 ’  ^n  “  80)  ^  SR^  =  °^e  Cn^ J 

These  lemmas  can  be  proved  by  employing  the  argument  used  in  proving 
Lemma  1  and  Lemma  2,  with  the  help  of  the  following  fact  (see  [5],  p,288): 
Suppose  that  c-j ,  S2*  •••  is  a  sequence  of  i.i.d.  random  variables,  E^  =  0 
and  E  exp(U1)  <  «  for  jt|  <  6,  5  >0,  then  for  arbitrarily  given  e  >  0 
there  exists  constant  c  >  0  such  that 

P(|  I  C,/n!  i  e)  =  0(e"cn), 
i  =  l  1 
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